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Abstract. Recently there have been several attempts to provide a whole set of 
generators of the ideal of the algebraic variety associated to a phylogenetic tree 
evolving under an algebraic model. These algebraic varieties have been proven to 
be useful in phylogenetics. In this paper we prove that, for phylogenetic recon- 
struction purposes, it is enough to consider generators coming from the edges of 
the tree, the so-called edge invariants. This is the algebraic analogous to Bune- 
man's Splits Equivalence Theorem. The interest of this result relies on its poten- 
tial applications in phylogenetics for the widely used evolutionary models such as 
Jukes-Cantor, Kimura 2 and 3 parameters, and General Markov models. 

1. Introduction 

Algebraic evolutionary models and the algebraic varieties associated to a tree 
evolving under these models have been an interdisciplinary area of research with 
successful results in the last five years. The use of polynomials in phylogenetic 
reconstruction was first introduced by biologists Cavender and Felsenstein |CF87j 
and Lake |Lak87j . Because of their interest in phylogenetics, there have been several 
attempts to provide a set of generators of the ideal of these algebraic varieties (see 
for example |AR07j . [SS05], [TJK09] . [CS05] l On the other hand, the authors of 
this paper have proven in [CFS07J that these generators can be successfully used in 
phylogenetic reconstruction. In other words, methods based in algebraic geometry 
can lead to the inference of the phylogenetic tree of current biological species. As we 
already did in [CFS0 8J. our aim in the present paper is to address again the study 
of these algebraic varieties towards their real applications in phylogenetics. 

Algebraic evolutionary models include the algebraic version of widely used models 
in biology such as Jukes-Cantor model [JC69] , Kimura 2 and 3 parameters model 
(cf. [Kim80], [Kim81] ) and the general Markov model (cf. [BH87j ). These models 
belong to what Draisma and Kuttler call equivariant models in |DK09j (see section 
2 for the precise definition). Following ideas of Allman and Rhodes and using rep- 
resentation theory, Draisma and Kuttler have recently given an algorithm to obtain 
the generators of the ideal of the algebraic varieties associated to a tree of n species 
evolving under an equivariant model from the generators of the ideal associated to 



Both authors are partially supported by Ministerio de Education y Ciencia, MTM2009-14163- 
C02-02, and Generalitat de Catalunya, 2009 SGR 1284. 

1 



2 



MARTA CASANELLAS AND JESUS FERNANDEZ-SANCHEZ 



a tree of 3 species and certain minors of matrices (the so-called edge invariants) . 
Nevertheless, a set of generators for trees of 3 species is not known for certain models 
such as the general Markov model (this is the so-called Salmon Conjecture) or the 
strand symmetric model (see |CS05| ). Therefore, a complete list of generators for a 
tree of n species evolving under these models cannot be given at this point. 

The goal of this paper is to prove that, whereas mathematically speaking it is 
interesting to know a set of generators of the ideal of these varieties, for biological 
purposes it is enough to consider certain generators. More precisely, the edge invari- 
ants mentioned above suffice to reconstruct the phylogenetic tree of any number of 
species (see the Theorem in the next page or Theorem 14.41) . This is a natural result 
if one thinks of the combinatorics result of Buneman that says that a tree can be 
recovered if one knows the set of splits on the set of leaves induced by its edges (cf. 
|Bun71j . |PS05t Theorem 2.35], see also Theorem 14.11 below) . 

Our inspiration goes back to the work |Fel91j of biologist Joe Felsenstein who calls 
phylogenetic invariants to those polynomial expressions that vanish on the expected 
frequencies of any sequences arising from one tree topology but are non zero for at 
least one tree of another topology. A tree topology in this setting is the topology 
of the tree graph labelled at the leaves with the name of the species. Algebraically 
speaking, he calls phylogenetic invariants to those elements of the ideal associated 
to a phylogenetic tree that allow to distinguish it from other tree topologies. In the 
mathematical context, the name phylogenetic invariants has usually been given to 
all elements of the ideal, see for instance the work of Allman and Rhodes [A~R07j. 
We want to go back to the original meaning of phylogenetic invariants because our 
focus is devoted to the applications of algebraic geometry in the reconstruction of 
the tree topology of current species. Therefore, we are mainly interested in precisely 
those elements of the ideal that provide information for phylogenetic reconstruction 
purposes; in other words, we are interested in phylogenetic invariants (i.e polynomials 
in the ideal of one tree topology of n species but not in the ideal of all other tree 
topologies on the same number of species) and the word invariants alone shall mean 
any element of the ideal. In colloquial language the main result of this paper is 
that, for phylogenetic reconstruction purposes, the relevant phylogenetic invariants 
are the edge invariants mentioned above. 

As our aim is to study these varieties regarding their applications in biology, 
let us roughly explain here how does algebraic geometry interfere with phylogenetic 
reconstruction. Let n be a number of biological species and assume that we are given 
an alignment of DNA sequences corresponding to them (the definition of alignment 
is rather technical but it refers to a collection of n-tuples in {A,C,G,T} n that will be 
also called columns of the alignment). Each column stands for sites in the n DNA 
sequences that have evolved from the same nucleotide in the common ancestor. 
We assume that these species are leaves of a phylogenetic tree T evolving under 
a probabilistic model A4 (in this paper we will only consider equivariant models, 
see Definition 2.4 for the precise definition). It is usual to assume as well that all 



RELEVANT PHYLOGENETIC INVARIANTS. 



3 



columns of the alignment behave independently and identically (i.e. all sites of the 
DNA sequences of these species evolve in the same way and independently of the 
other sites). Associated to this model M. there is a parameterization map giving 
the joint distribution of states A , C , G , T at the leaves of T as polynomial functions 
of continuous parameters. Therefore, as an alignment of DNA sequences evolving 
under this model on a tree T is a collection of observations of states at the leaves, 
it corresponds to a point in the image of this parameterization map. The algebraic 
variety V_m(T) associated to T is the closure of this image (see Definition 2.7). In 
the real life, alignments are not points of Vm (T) but they are close to Vm (T) if the 
model reasonably fits the data. Therefore the idea behind phylogenetic algebraic 
geometry is to use the ideal of V^(T) in order to infer the tree topology T. See 
|CGS 05j for an algorithm of phylogenetic reconstruction based on the generators of 
this ideal and |CFS07j for tests of it on simulated data. 

Up to now, all attempts have focused on giving a whole set of generators of 
I(Vm(T)) but our approach is more practical. As biologists assume that the model 
M. fits the data, the point given by an alignment is therefore assumed to be close 
to the union of all varieties V^vf(T) for trees of n species evolving under model Ai. 
Henceforth, we only need to know how is a particular variety V_m(T ) defined inside 
UtV_m (T) where the union runs over all trivalent tree topologies T of n species. In 
this algebraic geometry context our main result (Theorem 14.41) can be summarized 
in the following way. 

Theorem. Let T be the set of trivalent tree topologies on n leaves and let M. 
be an equivariant model. For each tree topology T 6 T there exists an open set Ut 
such that if p belongs to UrerUr, then p belongs to a particular variety Vm(T ) if 
and only if p belongs to the zero set of edge invariants ofT®. 

This result has also other consequences in phylogenetics. For instance, it says 
that edge invariants should not be used for model fitting tests (see |GP04] for an 
algebraic introduction to the subject) or for the study of identifiability of continu- 
ous parameters (see [AR08] for an explanation of these terminology) of the model 
because they are indeed phylogenetic invariants. Instead, they should be used in 
discussing the identifiability of tree topology of such models (see Corollary 13.91) as 
it was already done by Allman and Rhodes in [AR06] . We also find invariants (not 
phylogenetic invariants) that could potentially be used for model fitting tests, that 
is, linear polynomials that can be used for choosing the evolutionary model that 
best fits the data (see Remark 12.81) . 

Moreover, our main theorem allows one to give the exact degrees of those genera- 
tors relevant in phylogenetics (see Corollary 14. 12[) . whereas the degrees of a whole set 
of generators for the general Markov or strand symmetric models are still unknown. 
It is worth highlighting that these degrees can be computed by just knowing the 
model we are interested in, and they do not depend on the topology or the number 
of leaves we are considering. 
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Here we outline the structure of the paper. In section 2 we adapt the setting and 
notation of [DK09J to our convenience. As well, we prove and recall basic facts of 
group representation theory for those non-familiarized readers. Section 3 is devoted 
to prove a technical result that will be the key in the proof of our main theorem. 
Roughly speaking this result proves that edge invariants are indeed phylogenetic 
invariants for any equivariant model. This was already known for the general Markov 
model by Allman, Rhodes (see for instance AR06J) and Eriksson [Eri05j but it is 
new for the remaining equivariant models. The proof relies on providing a formula 
for the rank of the flattening of the tensor ty? along any bipartition of the set of 
leaves. In section 4 we prove Theorem 14.41 our main result. In the last section 
we provide an exhaustive collection of examples on how to compute the required 
edge invariants for the most used evolutionary models: Jukes-Cantor, Kimura 2 
and 3 parameters, strand symmetric and general Markov model. We compute them 
explicitly for quartet trees. It is our aim to make this section clear enough for 
biomathematicians so that, for example, we relate invariants used by biologist like 
Lake (see [Lak87j ) to the more technical definition of edge invariants (see the end 
of subsection 5.5). We also connect our edge invariants to Fourier coordinates that 
are more familiar to those readers used to group-based models. In particular, the 
reader can visualize what are the Fourier coordinates that are actually interesting in 
biology as not all of them are needed for phylogenetic reconstruction. This section 
is also a useful illustration of technical definitions given in sections 2 and 3 so it is 
a good idea to combine the reading of both sections with section 5. 

Acknowledgments: We would like to thank Josep Elgueta and Jeremy Sumner 
for useful comments on group representation theory. 

2. Preliminaries 

A tree is a connected finite graph without cycles, consisting of vertices and edges. 
Given a tree T, we write V(T) and E(T) for the set of vertices and edges of T. The 
degree of a vertex is the number of edges incident on it. The set V(T) splits into 
the set of leaves L(T) (vertices of degree one) and the set of interior vertices Int(T): 
V(T) = L(T) U Int(T). One says that a tree is trivalent if each vertex in Int(T) has 
degree 3. A tree topology is the topological class of a tree where every leaf has been 
labelled. Given a subset L of L(T), the subtree induced by L is just the smallest tree 
composed of the edges and vertices of T in any path connecting two leaves in L. 

Given an ordered set B = {bi, b 2 , . . . , bk}, we define W = (B)c as the C-vector 
space generated by the elements of B. For biological applications, the most common 
values of k are 2, 4 or 20 (for example, B = {A, C, G, T}). Now, given a subgroup G 
of the group of permutations of k elements, we consider the restriction to G of 
the natural linear representation 

p : 6 fc - GL(W) 



RELEVANT PHYLOGENETIC INVARIANTS. 



•5 



given by the permutation of the elements of B. This representation induces a G- 
module structure on W by taking 

g • u := p{g){u) e W. 

In fact, p induces a G-module structure on any tensor power of W, say ® l W : = 
W <g> . . . ® W, by taking 

(2.1) g ■ {u\ ® . . . ® Ui) := g ■ u\® . . . ® g ■ ui. 

Henceforth, any tensor power of W will be implicitly considered as a G-module with 
this action. 

From now on, we fix an ordered set B = {bi, b 2 , . . . , bk}, W = (B)c and a subgroup 
G C &k acting on W as above. 

Definition 2.1. A phylogenetic tree on (G, W) is a tree where every vertex p has a 
C- vector space W p = W associated to it, regarded as a representation of G via the 
map p defined above. 

Notation. The scalar product with orthonormal basis B p will be denoted by (. | .) p . 
This gives a canonical isomorphism from W p to W*. 

Notice that the scalar product (. | .) p is G-invariant, that is, (g-u \ g-v) p = (u \ v) p 
for every u, v e W p and any g G G. 

Definition 2.2. Given a phylogenetic tree T on (G, W), a T-tensor is any element 
of 

C(T) := ® peL( T)W p . 

A G -tensor on T is a T-tensor invariant by the action defined in (12. ip . The set of 
G-tensors will be denoted by C(T) G . 

From now on, if I > we write ® l W = W® . I . ®W. We denote by B(® l W) the 
basis of ® l W given by 

{u h ® . . . <g> u h | Ui. e B}. 

This is an othonormal basis with respect to the scalar product of ® l W given by 
{® p u p | ® p v p ) = YI p ( u p I %>)• If L C L{T) is a subset of L(T) and I = (jL, then we 
shall use the notation ®lW for the space ® p£ lW p — ® l W . 

Definition 2.3. Let T be a phylogenetic tree on (G, W) and assume that a dis- 
tinguished vertex of T (the root) is given, inducing an orientation in all the edges 
of T: write eo and e\ for the origin and final vertices of the edge e, respectively. A 
G- evolutionary presentation of T is a collection of tensors {^4 eo , ei } e eE(T) where each 
A eo>ei is a G-invariant element of the G-module W eo <S> W ei . The space of G-invariant 
elements of W eo ® W ei is denoted by (W eo ® W e J . 

1 Notice that evolutionary presentations are called representations in |DK09] , We prefer this 
terminology to avoid confusion with representation theory. 



G 



MARTA CASANELLAS AND JESUS FERNANDEZ-SANCHEZ 



If another root (orientation) on T is considered, inducing the opposite orientation 
on some edge e G E(T), we define A ei>eo := A* , where .* is the natural iso- 
morphism (W eo ® W ei ) G = (W ei ® W eo )°. We will often identify Rom G (W eo , W ei ) 
with (W eo ® W^J 6 via W e * o = W eo . With this convention, G-evolutionary pre- 
sentations on a tree do not depend on the orientation chosen. The space of all 
G-evolutionary presentations of T is the parameter space denoted by Parc(T) = 
riee£;(T) (Weo ® W ei ) . Notice that a G-evolutionary presentation of T induces by 
restriction a G-evolutionary presentation of any subtree of T. 

The space Par G (T), as well as C(T) and £(T) G , are irreducible affine spaces with 
their Zariski topology 

Definition 2.4. An equivariant model of evolution is a pair (G, W) as above, W = 
(bi, . . . , bk), G C &k- Trees evolving under this equivariant model are phylogenetic 
trees on (G, W) together with the space of G-evolutionary presentations. 

Equivariant models of evolution include the general Markov model [BH87] when 
G = {id}, the strand symmetric model |CS05| when G = ((AT)(CG)), and the alge- 
braic versions of Kimura 3-parameters [Kim81] (G = ((AC)(GT), (AG)(CT))), Kimura 
2-parameters |Kim80j (G = ((ACGT), (AG))) and Jukes-Cantor models [JU69] (G = 
64). We derive the reader to section 5 for specific computations with these models. 

Following [AR07] and |DK09j we present now a fundamental operation * on phy- 
logenetic trees, G-evolutionary presentations and T-tensors. To this aim, we first 
introduce a bilinear operation (• | •) between tensors induced by the bilinear form 
(• I •) on W. Let X and Y be two finite sets of indices with Z = X n Y 7^ 0, and 
such that every p in X or Y has associated a vector space W p = W to it. Define 

(. I .) : ® X W x ® Y W -> ®xuy\zW 

{® v exv p , ®peYU p ) h-> {® peZ v p \ ® pe zu p ) ({® P ex\zV p ) ® {® P eY\zU p )) 

Now, we define the * operation: 

* for trees: Given I spaced trees Tj, T| whose vertex sets only share a common 
leaf q with common space W q and common basis B q , we construct a new spaced tree 
*jTj obtained by gluing the TVs' along q; the space at a vertex of *jTj coming from 
Tj is just the space attached to it in Tj, with the same distinguished basis. 

* for G-evolutionary presentations: Given G-evolutionary presentations Ai G 
Par(Tj) for i = 1,...,/, we denote by *iAi the G-evolutionary presentation of *jTj 
built up from the Ai. 

* for tensors: Now let ipi be a Tj-tensor, for all i. Then we obtain a T-tensor as 
follows: 



b&B q 

Although this * operator is not a binary operator extended to several factors, when 
convenient we will write T\ * . . . * T\ for *{Ti and ip\ * . . . * ipi for *iipi. 
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Notation 2.5. A slightly more general *-operation will be needed in forthcoming 
section 3. Given <p A e {®xW) G and (p 2 G (® Y W) G , define 

Clearly, if T\ and T2 are two phylogenetic trees that share a common leaf q, then 
this definition agrees with the *-operation defined above. 

Now we describe a basic procedure that allows us to associate a T-tensor to any 
G-evolutionary presentation of T. We proceed inductively on the number of edges 
to define : Par(T) — > C(T). Let A G Par(T). First, if T has a single edge p, q, 
then ^t(^) := ^4gp, is an element of £(T) = W q ® W p . If T has more than one edge, 
then let q be any internal vertex of T. Two vertices p,gGT are adjacent if they are 
joined by an edge; in this case, we write p ~ q. We can then write T = * p ^ q T p , where 
T p is the branch of T around q containing p, constructed by taking the connected 
component of T \ {q} containing p, and reattaching q to p. The G-evolutionary 
presentation A induces G-evolutionary presentations A p of the T p , and by induction 
^T p (Ap) has been defined. We now set 

m T {A) := VVH(A>)- 

This definition is independent of the choice of q and the formula is also valid if 
q is actually a leaf (see |DK09j for details). Moreover, we have that the map 
ty T : Par G (T) — > C(T) is G-equivariant (see [DKOQl Lemma 5.1]). 

Remark 2.6. Notice that the above map : Parc(T) — > £(T) G is a continuous 
map in the Zariski topology. 

Definition 2.7. The algebraic variety associated to a phylogenetic tree T on (G, W) 
is 

V G (T) := {^t(A) I A E Parcr(T)} C C{T) 

where the closure is taken in the Zariski topology. 

Notice that we have Vg(T) C C(T) g . From now on, we will consider C(T) G 
as the ambient space of Vq(T) and X(T) will be the ideal of this variety in the 
corresponding coordinate ring. When the group is understood from the context, we 
will use the notation V(T). 

Remark 2.8. The inclusion C(T) G C C(T) is defined by a set of linear polynomials 
that are also invariants of any phylogenetic tree T on (G, W) (see the Introduction 
for the explanation of the word invariants). Although they are not phylogenetic 
invariants because they vanish on Vq(T) for any tree T, they might be interesting 
for choosing the model (G, W) that best fits the data. This application of invariants 
to model fitting will be studied in a forthcoming paper. 
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Figure 1. 

Example 2.9. If we consider B = {A, C, G, T} and G = {id} C ©4, we obtain the 
general Markov model. In this case, (W eo (g> W ei ) = (W eo ® W ei ) and no restrictive 
conditions are imposed on the parameters of the model. Thus, a G-evolutionary 
presentation can be identified, by taking the basis B in W with a collection of ma- 
trices {A e } e& E(T) and the parameters of the model are the entries of these matrices. 
When these entries are real non- negative values and their columns sum to 1, they 
can be understood as the probabilities of substitution among the 4 nucleotides: 

/ P(A I A, e) P(A I C, e) P(A | G, e) P(A | T, e) \ 

P(C I A, e) P(C I C, e) P(C j G, e) P(C j T, e) 

e ~~ P(G I A, e) P(G I C, e) P(G j G, e) P(G j T, e) ' 

\ P(T j A, e) P(T j C, e) P(T j G, e) P(T j T, e) / 

Here P(X | Y, e) is the conditional probability that nucleotide Y at the parent species 
eo is being substituted along edge e by nucleotide X at its child species e±. In our ter- 
minology introduced above, P(X | Y, e) is the coordinate of A e e W eo ®W ei = W®W 
corresponding to Y <g> X. Given a tree T, the G-equivariant map ^/^ is the parameter- 
ization that associates to each parameter set the vector of expected pattern frequen- 
cies p = (p Xl x 2 ...x n )x l es (that is, p Xl x 2 ...x n is the probability of observing XiX 2 . . . X n at 
the leaves of T) . For example, if T is a 4- leaf tree as in figure HJ then 

#t : JJ (W ® W) = C 80 ® 4 VF = C 256 

(A e ) e (PaAAA j PAAAC, • • • , Ptttt) 

and pxix 2 x 3 x 4 is the coordinate of p G £(T) = C 256 corresponding to the basis vector 
Xi ® X2 ® X3 ® X4. In this case, the image of is given by 

PX!X 2 X3X 4 = ^A e (Z, Y)A e(1) (X 1 , Y)A e(2) (X 2 , Y)A e{3) (X 3 , Z)A e{4 )(X 4 , Z). 

Y,Z 

Here 7r Y is the probability of nucleotide Y occurring at the root node (see figured]). 
Actually, in the original definition of ty? (see paragraph before Remark 12.61) we gave 
a reparameterization of Vq(T) where we omit parameters ir Y for convenience. 

Definition 2.10. Given a tree T, a bipartition of the leaves of T is a decomposition 
L(T) = Li U L 2 where L\ U L 2 = 0. We denote it as L\ \ L 2 . Notice that every edge 
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e of T induces a bipartition L\ \ L2 of L(T) by removing it; such a bipartition is 
called an edge split of T and will be denoted by the same letter e. 

2.1. Representation Theory. We will make use of representation theory of groups. 
A basic reference for this are the books [Ser77j and |FH91j and the reader is referred 
to them for definitions and well-known facts. 

From now on, write Q G = {ujx, . . . , uj s } for the set of irreducible characters of G. 
It is known that any two representations with the same character are isomorphic 
(Corollary 2 of § 2 of |Ser77j ). As a consequence of this and Schur's lemma (see §2.2 
of [Ser77| ) we obtain the following fundamental result in representation theory: 

Lemma 2.11. Let N u ,N u i be the irreducible linear representations of G with as- 
sociated characters uj,uj' G Qq- If f '■ N w — ► N u > is a G-module homomorphism, 
and 

(i) if ' uj 7^ uj' , then f = 0; 

(ii) if uj = uj' , then f is a homothety. 

In particular, Homa(N ul , N u ) = C. 

For every irreducible character uo t G tta, fix an irreducible G-module N Ut with 
associated character uj t . Then, for any G-module V, there exists a unique decom- 
position of V into isotypic components: 

(2.2) V = ® s t=l V[uj t } 

where each V[uj t ] is isomorphic to N Wt ® C m ' IJ '' v '' for some multiplicity m(uj t , V), 
t = 1, . . . , s. We also have that if V is another representation of G, then 

(2.3) Hom G (V, V) = ® s t=1 Hom c (C m(wt ' v/) , C m{uJt ' v,) ) 

Going back to our fixed vector space W, we already know that the space <S) W, 
I > is a G-representation as well and, as such, 

® l W = ® s t=l Nu t ® C m(u;t '® ,w) . 

We will denote by m(Z) the s-tuple 

m(Z) = (m(ui, ® l W), m(u s , ® l W)). 

In particular, m(l) will be denoted by m = (mi, . . . , m s ). Moreover, if x denotes 
the associated character to the representation p : G — ► GL(W), the decomposition 
(12. 2p above induces an equality of characters 

s 

t=i 

If a = (a t )t=i,..., s , b = (b t )t=i,...,s G N s , we write a < b if a t < b t for each t = 1, . . . , s. 
Similarly, min{a, b} is the s-tuple given by the minimum of each entry. 

Lemma 2.12. With this notation, we have m(Z) < m(Z') if I < I' . 
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Proof. We prove that m(Z) < m(Z + l) for any I. First of all, we show that if u\ G Vt 
is the trivial character, then mi > 1. To this aim, notice that the vector Y2b&B b £W 
is invariant by the action of any g G G. In particular, we have YlbeB b ^ M^^i] and 
so Ui does appear in the decomposition of x with non-zero coefficient. Now, given 
I > 0, write \ l = Ylt atUJt - The c l a i m follows from the fact that the coefficient of 
any irreducible character of G, say u t , in is just rn\a t + . . . > a t . □ 

Notation 2.13. If A is a matrix in M mj „, we say that A has maximal rank if 
rk (A) = min{n, m}. Following [AR07J, if m, n are s-tuples of positive integers 
we will use the notation M m n to denote the space M mi ni x • • • x M m ^ ns and if 
A = (Ai, . . . , A s ) G M m n , we will write 

rk(A) = (rk(A),...,rk(A s )). 

Notice that M m n can be understood as the subspace of Mj^ mt j^„ t given by the 
block-diagonal matrices with blocks of sizes writ x n t . Then, A G M m n has maximal 
rank as a matrix in My; mt rn t if and only if rk (A) = min{m, n}. 

2.2. Flattenings and thin fiattenings. The following definitions will be crucial 
for our purposes. 

Definition 2.14. Let T be a phylogenetic tree on (G,W) and let L\ \ L 2 be a 
bipartition of its leaves with l\ = %L\ and \<i = %L%. Let ip be a G-tensor on T. 

The flattening of ip along L\ \ L2, denoted by flatL^L^i is the image of if) via 
the isomorphism 

£{Tf = Rom G {® Ll W,® L2 W). 

The thin flattening of ip along L\ \ L2 is the s-tuple of linear maps, denoted by 
T fL 1 \L 2 {'4 , )i obtained from flat Ll \ L2 ip via the isomorphism 

s 

Hom G (® Ll W,® L2 W) = 0Hom c (C m(il)l ,C m(l2)t ). 

t=i 

Remark 2.15. Notice that the composition of linear maps induce a composition 
of flattenings and thin flattenings. Notice also that if ip G C{T) G and L\ \ L2 is a 
bipartition of L(T), then 

(flat LllL2 ?p) (u) = (ip\u), VaG ® Ll W 

where (• | •) is the operation defined in (12. 2p . 

Notation 2.16. If Tf Ll \ L2 (tp) = (ipi,ip 2 , ■ ■ -,ip s ), w e write 

rk r/ il | La (V) = (rk(Vi),...,rk(^)). 

We also denote rk Tf Ll \ L2 (ip) = J2t=i T ^(i J t) an d ca H it the rank of Tf Ll \ L2 {jp). 
Clearly, this definition is coherent with the usual definition of rank if we regard 
Tf Ll \ L2 (ip) as a C-linear map C^Mhh _^ £j2 t Mh)t^ 
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The following easy lemma is left to the reader: 

Lemma 2.17. We have that rk flati 1 \i 2 ip = Ylt=i dime rk (ipt)- Moreover, the 
following are equivalent 

(i) rk flatL l \L 2 ' l f ) ^ maximal; 

(ii) rk (ifj t ) is maximal Vi G {1, 2, . . . , s}; 

(iii) rk Tf^i^ (ip) is maximal; 

(iv) rk Tf Ll \ L2 {ip) is maximal. 

Remark 2.18. Once a basis for every C m ^' is chosen, we can identify 

s 

0Hom c (C m(Zl) %C m(i2)t ) 
t=i 

with the space of block-diagonal matrices M m (|j) im (| 2 ). The notation introduced is 
coherent with Notation 12.131 

Lemma 2.19. Let <f\ G (^l^ucW) and </? 2 G (®l 2 ucW) g be two tensors. Then, 

(a) flat^L^ * y? 2 ) = flat Ll \cMflat c \L 2 M; 

o 

(b) Tf Ll \ L2 (<pi *ip 2 ) = Tf Lll c(ipi)Tfc\L 2 M- 

o 

Notation 2.20. Given an edge e, we denote l e = ^2 beB b ®b G (W eo ® W ei ) . 
Given a phylogenetic tree T, we write It = (l e )egB(T) and call it the no-mutation 
presentation ofT. 

2.3. Degenerated trees and trees with observed interior vertices. For tech- 
nical reasons, we admit degenerated trees reduced to just one vertex, which is con- 
sidered as a leaf. If T — « g is such a tree, we associate the C-vector space W to 
q, making of T a phylogenetic tree on (G, W). Moreover, we take Parc(T) to be 
composed of the no-mutation presentation, that is, 

Par G (T) = {l q } where l g = u® u. 

u<=B 

and define '■ Parc(T) — > £(T) G = W by mapping l q to ^2 uGB u. The reader 
can think of such a tree as a two-leaf tree where we only accept the no-mutation 
presentation between its two leaves. All the above definitions are coherent with this 
interpretation. 

Definition 2.21. Let T be a phylogenetic tree on (G, W) and let q G Int(T). Then, 
we can write T = *™ jTj, where Tj are subtrees of T sharing the vertex q as a common 
leaf. Write T for the degenerated tree reduced to q. The tree T with observed q is 
defined by 
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Notice that, by definition, the leaves of T q are the leaves of T, L{T q ) = L(T), while 
C{T q ) = C(T) <g> W. Define a map 

: Par G (T) -> £(T 9 ) 

by taking = ^r (l<?) * ^tA a i) * ■■■* *T m (An), where A* is the restriction 

of the G-evolutionary presentation A to the subtree Tj. 

3. The ideal of an equivariant model 

In this section, we essentially prove that edge invariants are indeed phylogenetic 
invariants (see Introduction). The proof of this result is quite technical as it is valid 
for any equivariant model. 

Given a phylogenetic tree T on (G, W) on W and a bipartition /3 = L\ \ L % of 
the leaves of T, define T x (resp. T 2 ) as the minimal subtree of T that contains the 
leaves in L\ (resp. L 2 ). Clearly, we have V(T) = V(Ti) U V(T 2 ). Given two vertices 
p,q <G V(T), the c/iazn ch(q,p) is the linear subtree composed of the edges and 
vertices between q and p. Define a binary relationship ~ Ll among the leaves of L\ 
as follows: 

x ~ Ll y if ch(x, y) n T 2 = 0. 

Analogously, a binary relationship can be defined. It is easy to check that both 
~Li and ~l 2 are equivalence relationships. Write n\ and n 2 for the cardinals of the 
equivalence classes of ~l x and ~l 2 , respectively. For i = 1,2, write {£ij}j=i,..., ni 
for the resulting equivalence classes in L iy so that Lj = IJJli-^ij- Notice that if 
/i = (JLi and / 2 = (jL 2 , then ri\ < l\ and n 2 < / 2 . From now on, we will denote 

mp,T — m(min{ni, n 2 }). 

The main goal of this section is to prove the following Proposition, which is a 
generalization of [Eri05l Theorem 19.5] to equivariant models. Its interest lies in 
the fact that it translates the topology of a tree into rank conditions of suitable 
matrices. 

Proposition 3.1. Let T be a trivalent phylogenetic tree T on (G, W) and let (3 = 

L\ | L 2 be a bipartition of L(T) as above. Then, we have 

rk Tfpty) < m AT W> G V(T), 

and there exists a non-empty Zariski open set Up C V(T) such that the equality 
holds for every ip aUp. Moreover, 

(i) (3 is an edge split in T if and only if rng jT = m. 

(ii) If (3 is not an edge split in T , then m^T > m(2). 

The existence of the Zariski open subset above where the flattening attains the 
expected rank cannot be proven by a simple dimension counting as the following 
example shows. 
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Example 3.2. Consider G = {id} C ©4 and the quartet tree T having an inner 
edge e. Then Tf e (ip) can be seen as a 16 x 16 matrix M and its expected rank is 4 
according to Proposition ^. If i) . The variety Vq(T) has dimension 60 and is contained 
in the determinantal variety defined by the 5x5 minors of M, which has dimension 
256 — (16 — 5 + 1) (16 — 5 + 1) = 112. A priori Vq(T) could also be included in the 
variety of 4 x 4 minors of M which has dimension 256 — (16 — 4 + 1) (16 — 4+ 1) = 87, 
so that a general element of Vg(T) would not have the expected rank 4. 

Before proving Proposition 13. 1[ we need to state a couple of lemmas. 

Lemma 3.3. Let T be a phylogenetic tree on (G, W) and let (3 = L\ \ L 2 be a 
bipartition of L{T) such that every cherry of T is composed of one leaf in L\ and 
one leaf in L 2 . For a generic G- evolutionary presentation A of T , it holds that 
rk Tff,{V T (A)) = m PiT . 

Proof. Write L 1 = {u±, u 2 , . . . , u^} and L 2 = {t>i, v 2 , ■ ■ ■ , vi 2 }, and write n = l 1 + l 2 
for the number of leaves of T. Assume that 1 < l\ < l 2 . Notice that with our 
assumption, we have n\ = l\, n 2 — h and so, m^^ = m(lx). To reach the claim, we 
first show that the above condition for the rank rk defined an open set in Parc(T). 
Then, we will prove recursively that this open set is not empty. 

Let <f G C{T) G and write Tfp(<p) = (ipi, (p 2 , . . . , (p s ). Then Tfp(ip) has maximal 
rank if and only if 

(3.1) rk (pt = min{m(ni)t, m(n 2 ) t } for every t = 1, . . . , s. 

Each rank condition rk <p t < min{m(ni) t , m(n 2 ) t }, t = 1, . . . , s defines a closed 
proper subset Z t of Rom G (® Ll W, ® L2 W) = C(T) G . Thus, 

V = OTf \ U s t=1 Z t 

is a dense open subset in £(T) G , and for every <p G V, 

rk Tf p {(f) = m ftT . 

Moreover, '■ Parc(T) — > C(T) G is a continuous map, so V = ^ (V') is an open 
set in Parc(T). To prove that V is non-empty, we will recursively construct a G- 
evolutionary presentation A G Parc(T) with A e = l e for any terminal edge e, and 
such that 

(3.2) ikflat Ll{L ^ T (A) = k h . 

This implies that the rank of flat^i^TiA) is maximal and, by applying f)2.17p . 
we derive that so is rk Tfp(^ T (A)). From this, we derive that rk T fp(^ T (A)) = 
m(/i) = mp )T . 

For n = 2, it is enough to take A equal to the no-mutation presentation: It = 
b®b. For general n, take a cherry of T. By reordering the leaves of L\ and L 2 
if necessary, we can assume that the leaves in this cherry are ui t ,vi 2 . Let e be the 
edge of T adjacent to it and insert two vertices q\ and q 2 in the edge e. We obtain 
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Figure 2. Decomposition of T in the proof of Lemma [3.31 

a decomposition of T as follows: T = T 1 * T e * T 2 , where T e is a 2- leaf tree with 
leaves qi and q 2 , and T 1 and T 2 are the subtrees of T obtained when removing T e 
from T as shown in figure [2j Then, we have 

LIT 1 ) = {u 1 ,u 2 ,...,u h _ 1 ,q 1 ,v 1 ,...,v l2 - 1 }, 
L(T e ) = { qi ,q 2 }, 
L(T 2 ) = {u h ,v h ,q}. 

Write Li = {u\, . . . , w^-i, qi} and L 2 = {v\, ■ ■ ■ , f/ 2 -i}- Since T 1 has n — 1 
leaves, our assumption says that there is some G-evolutionary presentation, say 
A 1 G Par G (T 1 ), such that 

rk flat LW , Lm *Ti(A) =k™^ l *- 1 l 

Define A = A 1 * A e * 1? 2 G Parc(T), where A e G Komc(W qi , W q2 ) is generic and we 
will show that the equality (13.21) holds. To this aim, we claim that 

(3.3) ^t(A) = (^^(A 1 ) <g> 1) * <f)(A e ). 

LiU{q} 

Proof. First of all, the decomposition A = A 1 * A e * induces a decomposition of 
ty T (A) as 

= ^ (A e \ Zl ® z 2 ){^ T i(A 1 ) | zi) ® (^t<1t 2 ) I Z2). 

and notice that 

(^ T i(A X ) | 2i) ® (^ T 2(1 T2 ) I z 2 ) = (^(A 1 ) I Zt) ® (z 2 ® z 2 ) = 

(* T i (A 1 ) I 6i ® . . . ® 6/^1 ® «i) g> (z 2 ® 22) ® &i ® • • • ® 

i=l,...,h-l 
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Thus, we obtain that ^t{A) is equal to 

^2 ( A e I zi ® Z 2 )(^ T i(A 1 ) | 61 ® . . . (glfyj-l (8> Zl) <8> ^2 <8> ^2 <8 61 (8) . • • ® feji-l. 

zi,«a,6i,... 1 6i 1 -i 

On the other hand, consider the G-equivariant map 

<p(A e ) : ® Ll u{ qi }W -> ® Ll W 
61 <g> . . . <g b h -i ® b h <g 6 ?1 h-> (A e I ft/! <8> b qi )bi ® . . . <g 6^-1 <g 

and define the tensor 0(A e ) as the image of <p(A e ) via the isomorphism 

Eom G {® LlU{qi} W,® Ll W) S ((<8> LlU { gi }W0 <8> (® Ll l^)) G . 
Thus, if 6j G £> Ui , i = 1, . . . , l\ — 1, Z\ G and z 2 £ Ad then 

(4>{A e ) I 61 <g ... <g (g zi (8) z 2 ) = (A e I Zi <g ^ 2 )-22 (g &i <g . . . <g 
If 1 = 6 <g> 6 G Wu, (g , we have (1 | z 2 ) = ^2 and so 

(^(A 1 ) (g) 1 I 61 ® . . . ® 6 (l _i ® «i ® 2 2 ) = (^T^A 1 ) I 61 ® ... ® ftfe-i ® Zx) ® Z 2 . 

Putting all together, we obtain that (^-^(A 1 ) (g) 1) * 4>{A e ) is equal to 

LiU{g} 

^ (A e I zi <8 z 2 )(^t 1 (-4 1 ) I &i <8> •• • <8 ft/1-1 <8 zi) ® Z2 <8> (z 2 <8 61 <8 . . . ® fe^-i) . 

b-L,...,b ll ^i,Zx,Z2 

This proves the claim. 

Now, apply Lemma 12.191 to (13.31) to get 

flat LllL2 y T (A) = flatL^u^y^T^A 1 ) <g> l)flatL lU{q}{L2 (j)(A e ). 

It is straightforward to check that 

flat LlU{q}l L 2 {^^(A 1 ) <g> l) = (V^w^w^ti (A 1 )) <g> (/iot {lt|i} | {fl!a} l) , 

so the rank of F := /^atL 1 u{g}|L 2 (^t 1 ^ 1 ) (g) 1) is equal to the product of ranks: 
k x k mm{h-i,h} = fc min{/ 1 ,i 2 +i}_ Qn the other hand, write G(A e ) for the matrix of 
flati^Liuiq^iAe) in the basis B(<&i, x W) and ^((gi^^jW). It is a block diagonal 
matrix, each block being a convenient column of the matrix A e . Then the rank of 
flat Ll \L 2 V T (A) is k h if and only if Ker (F) n Im(G(A e )) = {0}. Since this holds for 
a generic matrix A e , the claim follows. □ 

Lemma 3.4. LetT be a phylogenetic tree on (G, W) and letq G Int(T). Assume that 
q has degree two while the remaining interior vertices have degree three and write 
T q for the tree with observed q. Then, for a generic G -evolutionary presentation 
A G Parc(T), we have that 

rk Tf {q] \ L{T) (m Tq (A)) = m. 
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Figure 3. Construction of the trunk and the boughs of a given tree 
T. Black and white dots represent the leaves in L\ and L2, respec- 
tively. Notice that some may be reduced to the vertex m t j. Indeed, 
this happens if and only if Uij G Lj. 



Proof. First of all, notice that t\hT f^\ L{T) (ip) < m for all *0 G C{T q ) G and 
that there is a non-empty open set U C C(T q ) G where the above inequality holds. 
Indeed, if Tf{ q }\nr) (VO — (V'l; • ■ • 3 V's) it is enough to take U = D| =1 Z7i, where each 
t/j is defined by asking that rk ipi = mi. Every Ui is a dense open set in C(T) G , and 
so is U . 

To reach the claim we only have to prove that ^^\{U) is not empty. To this aim, 
it will be enough to consider the no-mutation presentation 1 = {l e }e€E(Ti)- The 
linear map flat^\ L ^ T ^ Tq (l) : W — > ®l(t)W defined by b 1— > b ® ... (g) 6 has rank 
equal to dim(PF). By virtue of 112. 17ft . we infer that 1 G ^^l(U) and we are done. 
□ 

Now, we come back to the general case. So, let T be a phylogenetic tree on (G, W) 
and let (3 = L\ \ L 2 be a bipartition of L(T). We introduce some terminology and 
notation that will be helpful. For the seek of clarity, this notation will not reflect 
its dependence on j3, but confusion should not arise since the bipartition is fixed 
throughout this section. Keep the notation introduced at the beginning of this 
section: 

1. For % = 1, 2 and j = 1, . . . , n^, denote by Tjj the minimal phylogenetic subtree 
of T containing Ljj. These subtrees are called the boughs of T relative to 
(3. Every Zy has a distinguished vertex, denoted by ity, which has degree 
two. All the remaining interior vertices have degree 3. For £ = 1,2, write 

Li = {Uij}j=l,..., ni - 

2. The trunk of T relative to (3, denoted by Tr, is the phylogenetic tree on 
(G, W) obtained when removing all the boughs from T. Equivalently, Tr is 
the minimal subtree of T containing all the Uy, so that L(Tr) = Lf U L^- 

See figure [3] for an example of this construction. 
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Notice that Tr is reduced to a 2-leaf tree if and only if (3 is an edge split of T. 
Notice also that every cherry in Tr is composed of one leaf in L R and one leaf in 

Proof of Proposition 13.11 Every G-evolutionary presentation A e Par G (T) 
induces by restriction a G-evolutionary presentation A R in the trunk Tr and G- 
evolutionary presentations Aij in the boughs T it j. Actually, the mappings A \— > A R 
and A \— > A it j define continuous maps 

vr^ : Par G (T) -> Pai G (T R ) : Par G (T) -> Par G (T ij3 -). 

Given A G Par G (T), we proceed to decompose ^(A) in terms of tensors associ- 
ated to the trunk and the boughs of T. To this aim and for every i,j, consider the 
tree T"j J with the vertex Uij observed in it (see section 2.2). It is straightforward 
to check that T is recovered by joining these boughs to the trunk: 

T = *(...* * (T 2 y *(...* (T%? * Tr) . . .)) . . .). 

Regarding A iy j as a G-evolutionary presentation of T^' 3 , write ^j(A) G C{T^ 3 ) for 
the image of by the map defined in Definition 12.211 Then write 

<pf = ®]LmAA) G (® L nu Ll Wf 

<d = ®?=m,M) g (®l«ul 2 ^) g 

= ^(A*) G £(T«) G 
From the construction of these three tensors, it is clear that (see Notation 12.51) : 

V T (A) = <p? * A * ■ 

Write /3i = L x | Lf , /3 2 = Tf | L 2 and (3 R = L R \L R . By applying (12719]) we infer 
the following equality of maps 

Tf p (? T (A)) = Tf P M)Tfp R ^ A R )Tf P AV2) 

and from it, the inequality rk T fpi^T^A)) < rrig^. Next, we prove that the equality 
actually holds for a generic G-evolutionary presentation. To this aim, we will show 
that the three maps above have maximal rank for a generic G-evolutionary presenta- 
tion A of T. It is straightforward to check that this will imply that rk Tfp(^x(A)) = 
mp t T (use for instance the Frobenius inequality, see Section 2.9.6 of |Eve80j ). 

First of all, we infer from Lemma [3731 that there is a dense open set Vr C Par G (Tr) 
such that for every B G Vr, rk T fp R (ty TR (B)) = m.p R>TR . Since the map tt r : 
Par G (T) — > Par G (Tr) is surjective, the set 

u« 1=^00 

is a non-empty Zariski open set in Par G (T). 
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Similarly, for every i — 1,2 and j — 1,2, ... , rii, let V it j C Par G (Tj j) be the dense 
open set defined by Lemma E31 applied to T^' 3 and U it j = ir^j (Vij). Since Par G (T) 
is irreducible, it is clear that 

rii 

Ui=f) U id , i = l,2 

is non-empty and open. On the other hand, for any A G Par G (T), we have 

TfafaiiA) ® . . . ® <p itni (A)) = ^UTh^iuiA^M)) 
and therefore (see for instance [Eve 80j ) 

rii 

rk Tfp^f) = Y[ik Tf Li . ]{ui . } (cp itj (A)) = m(m). 

3=1 

Thus, if A G Uj, the rank of Tf Ll l r(^±) is maximal. 
To finish the proof it is enough to take 

Up = * T (Ui nUjjn u 2 ) c v T . 

Finally, if /3 = L\ \ Li is an edge split, then n\ = ri2 = 1 and Tr is 2-leaf tree. It 
follows that m^T = m. This proves (i). If (3 is not an edge split, it is clear that 
n\,n2 > 2 and the claim of (ii) follows by Lemma [2.121 □ 

Remark 3.5. The preceding proof actually shows that the dense open set C 
Par G (T) cuts the set of stochastic parameters, i.e 

c/ Ll | L2 n a g ^0, 

eeS(T) 

where A G is the set of Markov matrices, that is, matrices whose entries are all non- 
negative and whose columns sum to 1. Indeed, as suggested by Lemma 13.31 and 
Lemma 13.41 it is enough to take A G Par G (T) with A e = ^2 b€B b <S> b whenever 
e G E(Tr) is terminal or e G E(T it j) for some and A e a generic Markov matrix, 
otherwise. 

Proposition 13.11 suggests the following definitions. 

Definition 3.6. If L\ \ L2 is a bipartition of L(T), the ideal of L\ \ L2, denoted by 
Ili\l-2-, is the ideal in the coordinate ring of £(T) G defined by the conditions 

rk Tf Ll \L 2 (i/j) < m 

being ip G £(T) G a tensor of indeterminates. Equivalently, is generated by 

the (m t + l)-minors of the i-th box of Tf Ll \ L2 (ip) G M m (^) iIn (/ 2 ), for t = 1, . . . , s (see 
Notation ElS]). 
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Notation 3.7. Let X be a phylogenetic tree on (G, W) and let e be an edge of X 
that splits the leaves into two sets L\ and L 2 of cardinality l\ and l 2 , respectively. 
The ideal Il 1 \l 2 will De a l so denoted as I e . Due to Proposition 13. II we have that if e 
belongs to E(T), then I e C J(T). 

Definition 3.8. The edge invariants of T are the elements of the ideal Yl e eE(T) ^e- 

Proposition 13.11 proves that edge invariants are phylogenetic invariants, that is, 
elements in X(X) that do not vanish on all points of UtV(T) where the union runs 
over all trivalent tree topologies. Indeed, given a G-spaced tree X and an edge 
e G E(T ), there exist trivalent trees that do not have e as an edge split and so I e is 
not contained in X(UtV(X)). 

Is is worth highlighting that using Proposition 13.11 we also obtain the generic 
identifiability of the tree topology for equivariant models. The tree topology of 
a model of sequence mutation is said to be generically identifiable if for generic 
choices of stochastic parameters A G riee£(T) ^ e ELe^T') ^ G ( see Remark 
13. 5p . ^>t{A) = ^t'{A') implies X = X' (see for instance [AR06J). In order to prove 
this kind of results, one only has to show the corresponding irreducible varieties V(T) 
and V(T') are not contained one into the other. We obtain the following result that 
was already known for the general Markov model (see |Ste94j ) and for group-based 
models |SHSE92j . 

Corollary 3.9. The tree topology is generically identifiable in all equivariant evo- 
lutionary models. 

Proof. Let X, X' be two different trivalent phylogenetic tree on (G,W). Then 
there is an edge split e in T that is not an edge split in X'. By Proposition 13. 1[ there 
exists an element / in I e (and therefore in X(T)) that does not belong to X(T'). In 
terms of varieties this proves that V(T') C V(T), and that V(T) C V(T') is proven 
similarly. As V(T) and V(T') are irreducible varieties, this shows that they meet 
properly. □ 

4. Phylogenetic Invariants 

The purpose of this section is to prove that, for phylogenetic reconstruction, the 
only relevant invariants are the edge invariants introduced in the previous section. 
This is a natural result if one takes into account the Splits Equivalence Theorem in 
combinatorics (see Theorem 14. II below) . Let X be the set of trivalent tree topologies 
with leaf set {v±,V2, ■ ■ ■ , v n }. Two bipartitions Li\L 2 , M\\M 2 of a set L are said to 
be compatible if at least one of the four intersections L\ H Mi, L\ H M 2 , L 2 D Mi, 
L 2 fl M 2 is empty. For example, if L\\L 2 , M\\M 2 are two edge splits of the same 
tree X, then they are compatible. We recall that any trivalent tree on n leaves has 
2n — 3 interior edges. 
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Theorem 4.1 ( [Bun71j . [PS05[ Theorem 2.35]). A collection E of In — 3 bipartitions 
is pairwise compatible if and only if there exists a tree T G T such that B is the set 
of edge splits ofT. Moreover, if such a tree T exists then it is unique. 

In order to make our result concerning phylogenetic invariants more precise we 
need to introduce some notation. 

We fix G C &k and W as in section 2 and each topology T G T will be considered 
as a phylogenetic tree on (G,W). Then all trees T in T have the same space of 
G-tensors which will be denoted by C = (<S)™ =1 W) G . 

Definition 4.2. Let o be an s-tuple and let /3 = L\ \ L 2 be a bipartition of 
{v\, v 2, . . . , v n }. Then we let D< Q be the subvariety of £ defined as 

D| = G C | rk Tfp(ip) < o} 

and, if the thin flattening of G C is Tfp{ijf) = (/0i, ip2, ■ ■ ■ , Vv)' we define D< Q to 
be the set 

D< = {ip G C | rk ijjj < Oj for some j }. 
For example, -D< m coincides with the set of zeroes Z(I Ll L2 ). Notice that both D< Q 
and Z)< are algebraic sets although the second is not irreducible. 

Notation 4.3. Given a tree T G T and using the notation of Proposition 13.11 for 
each bipartition (3 = L\ \ L2 of {fi,f2, • • • , v n }, we call m^y the maximum rank 
that Tfp(if)) can have if ip belongs to V(T). Then Proposition 13. II shows that 

V{T) C D^ mp T 

and that V{T)\D^ is a dense open subset of V(T) for any bipartition (5 = L\ \ 
L 2 . We call this open subset U^p, so that U^p = V(T) \ -D< m(3 T is the locus of 
tensors ip G V(T) that satisfy rk Tfp(i/j) = mp t T- We define Ut = ^pUr,p, where 
the intersection is taken among all bipartitions of {t>i,t>2, • • • ,v n }. As V(T) is an 
irreducible variety, Ut is still a dense open subset of V(T) and it corresponds to the 
set of points in V(T) whose flattening Tfp(ip) along any partition /3 of the set of 
leaves of T has the expected rank mp } T- 

With this set up in mind, the main result of this paper is the following. 

Theorem 4.4. For each T G T let Ut C V(T) be the dense open set defined above. 
Let p be a point in IJrer Ut Q C and let Tq be any tree in T . Then, p belongs to 
V(Tq) if and only if p belongs to the set of zeroes Z(^2 e€ E(T ) ^e)- 

Remark 4.5. As we pointed out in the introduction, this result says that for a 
general point on IJrer ^CO> ^ i s enough to evaluate the edge invariants to decide 
to which variety V(T) the point actually belongs to. 

This result would still hold for non-trivalent trees when imposing that all trees in 
the corresponding set T have the same collection of degrees at interior vertices. 
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After all the technical issues in section 3, the proof of Theorem 14.41 is now straight- 
forward. 

Proof of 14.41 By Proposition [3J] we already know that J2eeE(T ) le Q Z(T ), 
therefore if p G V(T ), we immediately have that p belongs to Z \J2 e eE(T ) ^ e )- 

Conversely, let p G Utgt^t- Then p belongs to Ut C V(T) for a certain T G T, so 
that rk Tfp{p) = m^r for any bipartition (3 of {vi, t> 2 , . . . , v n }. On the other hand, 
if p G Z(J2 eeE( T ) J e)> then P e ^( J e) for an y e e E ( T o) an d hence, rk Tf e (p) < m 
for all e G E(T Q ). This implies that m e ^ < m for all e G E(T ), which can only 
happen if e is a split of T for all e G E(T ) (see Proposition 13.11) . But two trivalent 
trees T and To on n leaves have the same collection of splits if and only if T = To 
(see Theorem 14.1 p . so the proof is concluded. □ 

Remark 4.6. Theorem 14.41 also says that the intersection Ut H Ut> is empty for 
any T ^ T' G T '. However, there exists points in V(T) D V(T') for any T 7^ T'. 
Indeed, it is enough to consider if)? {A) where A is the no-mutation presentation; 
then i[)t(A) lies in V(T") for all T' . This proves that f] T V(T) is not empty but one 
can also prove that, if n > 5, for any two different tree topologies Ti,T 2 one has 
V(Ti) n V(T 2 ) 7^ fir 

In the next Corollary we give an open subset U defined intrinsically from the 
ambient space C such that U fl UtV(T) = U t Ut- This is relevant for biological 
applications because then we will be able to check whether the given data point lies 
(or rather is close to) in \J?Ut- From now on let B be the set of all bipartitions of 
{vi, ■ ■ .,««}• 

Corollary 4.7. Let U = \J f] (C \ T ) . Then 

TeT f3eB 

un{J v(T) ={Ju T 

TeT TeT 

and if p is a point in U fl IJtgt ^"CO an( ^ -^0 an 2/ ^ ree *n t/ien p belongs to 
V(Tq) if and only if p belongs to the set of zeroes Z(Y2 eeE ( To ) ^e)- 

Proof. We just need to prove that U fl (Urer ^CO) — Utgt Ut because the other 
assertion follows from Theorem 14.41 

We have U fl (U Te r^( T )) = Ut,t' V ^ n ( n /3 £ \ D <m,, T ,)- K T ± T this 
intersection is the empty as we can see taking (3 an edge split of T but not of T' . 
Hence we obtain U n (U TgT ^(T)) = \J T V(T) n(Qg£\ D p <mp T ), which is precisely 

u T u T . □ 



In terms of ideals, Theorem 14.41 says the following: 
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Corollary 4.8. Let R be the polynomial ring of £ and let f be any element in 

^fi j ( D <-^))\n j ( T )- 

\TeT /3eB J T 

Then, the following equality holds in the localized ring [R/ f] T T(T))j 
(l(T )/f)l(T))_=lrad( £ I e )/f]l(T) 

^ T ' f \ ^ e€E(Ta) ' T 



■ e€E(T ) 

Proof. If we are given an / as above, then Uf := £ \ {/ = 0} is contained 
inside the open set U defined in Corollary 14.71 Indeed, an / as above is contained 
inside rad(J2reT ^fl{D <mi} T )) which is equal to Z(nTU/3-D< m/3 T ). Therefore flrU^ 
D P <m ^ T C {/ = 0} and U f C £ \ D T D< mg T = U. 

In particular, Uf D (UtV(T)) is contained inside UtUt- Therefore in f/j we still 
have that the variety V(T ) is defined inside Ur e r^(X) by J2 e eE{T ) Hence in 
terms of ideals in Rf we obtain the equality above. □ 

We do not know whether Yle€E(T ) ^ s a ra dical ideal so we cannot remove rad 
from the expression above. We pose the following question: 

Question 4.9. Given a set S of compatible splits, is X^es ra dical? 

Remark 4.10. In order to check whether Theorem 14.41 can be applied to a given 
data point p 6 £, it is enough to check that /(p) ^ for a generic / in 

En j (^<^))\n j ( T )- 

\TeT /3eB J T 

Such a polynomial / should be chosen a priori, so that when dealing with data one 
does not need to compute this ideal. 

Remark 4.11. It is interesting to explore whether Ut can be defined by a complete 
intersection in the sense of [CFS08j . This would reduce the number of generators 
of I e to be used in phylogenetic reconstruction. However, this is another issue on 
which we plan to work in the future. 

Although the degrees of a set of generators of the ideal of a phylogenetic tree 
evolving under the general Markov model or under the strand symmetric model are 
not known, Theorem 14.41 allows us to give the degrees of those invariants that are 
relevant in phylogenetic reconstruction. It is worth highlighting that these degrees 
do not depend on the number of leaves but only on the model and can be computed 
a priori (see the next sections for the precise examples of evolutionary models). 
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Corollary 4.12. Let (G,W) be an equivariant evolutionary model and let m = 
(mi, ■ ■ ■ i m s) be defined as in section 3. Then, for any tree topology on any number 
of leaves, the polynomials that are relevant for recovering the tree topology in phylo- 
genetics have degrees in {mi + 1, . . . ,m s + l}. In particular, the relevant phylogenetic 
invariants for the following evolutionary models have degrees: 

• 5 for the general Markov model. 

• 3 for the strand symmetric model. 

• 2 for the Kimura 3-parameter model. 

• 1 or 2 for the Kimura 2-parameter model. 

• 1 or 2 for the Jukes-Cantor model. 



5. Examples 

In this section, we study some well-known evolutionary models in phylogenetics. 
Let B = {A, C, G, T} be the set of the four nucleotides and take W = (A, C, G, T) c = C 4 
with the bilinear form (• | -)w that makes B orthonormal. We consider the group 
of permutations of 4 elements, 



6 4 = Sym{B}. 



It is generated by g x = (id),g 2 = (kC),g 3 = (ACG),# 4 = (ACGT) and g 5 = (AC)(GT), 
which correspond to the five conjugacy classes of ©4. We work with the natural per- 
mutation linear representation p : 64 — > GL(W) given by permuting the coordinates 
oiW: 



9i 



1000 
0100 
0010 
0001 



92 



0100 
1000 
0010 
0001 



93 



10 
10 
10 
1 



94 



1 
10 
10 
10 



95 



10 
10 
1 
10 



Write x = Tr(p(-)) for the character associated to it. We shall consider different 
subgroups of 64, each one of them giving rise to a different equivariant model, 
according to the following diagram (we use the following shortenings: GMM for the 
general Markov model, K81 for the Kimura 3 parameter model, K80 for the Kimura 
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2 parameter model, CS05 for the strand symmetric model and JC69 for the Jukes- 
Cantor model): 



<(AC)(GT),(AG)(CT)> 




((ACGT), (AG)) 



<(AT)(CG)> 



GMM 



K81 



K80 



JC69 




CS05 



Our aim here is to describe in a unified fashion the edge invariants associated to 
these models for the case of a quartet tree topology T, with leaves ^1,^2,^3,^4- 
Write e = L\ \ L 2 for the edge split corresponding to e, so that L\ = {t>i,t>2} and 

L2 = {>3,M- 




v 4 



Remark 5.1. When the subgroup G C ©4 is abelian, the usual product of complex 



numbers induces on a group structure 
W[u; t ], for every uj t € Qq, we have that 



is a C-basis for 



K ® 

) l W)[u t ]. 



Then, if {u\, 
u h . . . u k = L0 t } 



} is a basis for 



5.1. General Markov model. As a first example, consider the trivial subgroup 
{1} C 64. The corresponding equivariant model is the general Markov model, 
which is the most general model in the Felsenstein hierarchy (see Ch.4 in [ PS05] ). 
Invariants for this model have been studied by Allman and Rhodes in |AR03t lA~R07j. 
In this case, there is only one irreducible representation u : G — ► C defined by 
mapping (1) to 1. The character table is 





id 




1 


X 


4 



It follows that x — 4cu. Keeping the notation introduced in 12.11 we have m = (4) 
and W = W[uj\ =N W ® C 4 . 
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Now, for the case of four leaves, we have y 2 — 16a; and m(2) = (16). Then, the 
ideal I e is defined by the condition 

rk (M) < (4) 

where M £ Hom G ((lf ® W)M,(W ® W)[w]) = Hom c (C 16 , C 16 ) is a matrix of 
indeterminates whose columns and rows are indexed by the set {X\ <g> X 2 }xi,x 2 eB- 
The ideal I e obtained by imposing the above rank condition is generated by ( 5 6 ) Q 6 ) 
polynomials of degree 5. 

5.2. Strand symmetric model. Take G = ((AT)(CG)), which is isomorphic to 
Z/2Z. The equivariant matrices for this group have the following structure: 

/ a b c d \ 

e / 9 h 

h g f e 
y d c b a j 

The equivariant model associated to G is the strand symmetric model introduced in 
|CS05| . There are two irreducible characters u>i,u>2, and the character table is 





id 


(AT)(CG) 




1 


1 




1 


-1 


X 


4 






Notice that since G is abelian, all the irreducible representations have dimension 
one. It follows that x = + 2^2- Thus, m = (2, 2) and we have a decomposition 
r [FH9lT Corollary 2.14]) 

W = %]©PK[w 2 ], 
where W^] = ® C 2 and W[lu 2 ] = ® C 2 . Indeed, if we write 
Ul =A + T u 2 = C + G vi = A - T v 2 = C-G, 

we have 

W[u!] = (ui,u 2 } c W[u 2 ) = (vi,v 2 ) c - 

Now, we focus on the case of the tree with four leaves. We have \ 2 = 8ui + 8a; 2 , 
so m(2) = (8,8). Moreover, using that G is abelian (see Remark 15. ip 

W 7 ®!^^!] = (Ui ® Ul,Ul (g) U 2 ,U 2 (g) Ui,U 2 (g) U 2 , Vi (g) Vi, Vl ® V 2 , V 2 (g) Vi, V 2 (g) v 2 ) 

iy(giy[u; 2 ] = (ui (g vi,ui (g v 2 ,u 2 (g vi,u 2 (g v 2 , vi (g ui, vi (g u 2 , v 2 (g ui, v 2 (g u 2 ) 
Then, the ideal I e is defined by the conditions 
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where 



/ 



/ 



V 

and q xyz t 

by©© 



Quiuiuiui 

*?U1U2U1U1 
Q'u2UlUlUl 
*7u2U2UlUl 

V2U1U1 

Qv2ViUiUi 
^V2V2UiUi 

Q'uiVlUlVl 
^UiV2UiVi 
^U2VlUlVl 
Qll2V2'UiVi 

Q'viU2UlVi 
^V2UiUiVi 
^V2U2UiVi 



QuiUllllU2 
Q'uiU2UlU2 
^U2UiUiU2 
^U2U2UiU2 

On 

V2U1U2 

^V2ViUiU2 
( ?V2V2UiU2 

Qlll V1U1 V2 
^UiV2UiV2 
^U2VlUlV2 
<2ll2V2lliV2 
9viUiUiV2 
^ViU2UiV2 
'?V2UlUiV2 
^V2U2UiV2 



?U1U1U2U1 
9uiU2U2Ul 
^U2UiU2Ui 
^U2U2U2Ui 
^ViViU2Ui 

V2U2U1 

^V2ViU2Ui 
^V2V2U2Ui 

^UiViU2Vi 
^UiV2U2Vi 
^U2ViU2Vl 
^U2V2U2Vi 
^VlUiU2Vl 
^ViU2U2Vi 
^V2UiU2Vi 
^V2U2U2Vi 



?U1U1U2U2 
9uiU2U2U2 
^U2UiU2U2 
^U2U2U2U2 
^ViViU2U2 
V2U2U2 
'?V2ViU2U2 
(/V2V2U2U2 

^UiViU2V2 
^UiV2U2V2 
^U2VlU2V2 
^U2V2U2V2 
( 2Vi"UlU2V2 
^ViU2U2V2 
( ?V2UlU2V2 
^V2U2U2V2 



are the coordinates in the basis x § 
+ Q Q = 6272 polynomials of dej 



9uiuivivi 

?UlU2VlVl 
^U2UiViVi 
^U2U2ViVi 

^ViV2ViVi 
^V2VlViVi 
^V2V2ViVi 

^UiV2ViUi 
Qll2VlVlUl 
^U2V"2VlUl 

^VlU2VlUi 
^V2UiVlUi 
^V2U2VlUi 

)y ® z®t. 
;ree 3. 



?U1U1V1V2 
9lllU2ViV2 
^U2UiViV2 
^U2U2ViV2 
Q'viVlVlV2 
V2V1 V2 
^V2ViViV2 

^UlVlVlU2 
^UiV2ViU2 
^U2VlVl1l2 
^U2V2V"lU2 
^VlUlVlU2 
^ViU2ViU2 
^V2UlVlU2 
(/v2U2ViU2 



?UlUlV2Vl 
?UlU2V2Vl 
^U2UiV2Vi 
^U2U2V2Vi 
ViV2Vi 
Q'vi V2V2V1 
^V2V"lV2Vi 
^V2V2V2Vi 

^UlViV2Ui 
^UiV2V2Ui 
^U2V"lV2Ul 
^U2V2V2Ul 
QV1U1V2U1 
^ViU2V2Ui 
^V2UiV2Ui 
^V2U2V2Ui 



9uiUlV2V2 
?UlU2V2V2 
^U2UiV2V2 
^U2U2V2V2 
V1V2V2 
On V2V2V2 
^V2VlV2V2 
^V2V2V2V2 

^UiVlV2U2 
^UiV2V2U2 
^U2VlV2U2 
^U2V2V2U2 
^VlUlV2U2 
^ViU2V2U2 
'?V2UlV2U2 
^V2U2V2U2 



/ 



We see that J e is generated 



5.3. Kimura 3-parameter model. Take G = ((AC)(GT), (AG)(CT)), which is also 
isomorphic to Z/2Z x Z/2Z. The equivariant matrices for this group have the 
following structure: 





a 


b 


c 


d \ 




b 


a 


d 


c 




c 


d 


a 


b 


V 


d 


c 


b 


a J 



In this case, the equivariant model is the Kimura 3-parameter model introduced in 
|Kim81j . We write ui k , u c , u G , u T for the irreducible characters of G. The correspond- 
ing table is 





id 


(AC)(GT) 


(AG)(CT) 


(AT)(CG) 


u k 


1 


1 


1 


1 


UJ C 


1 


-1 


1 


-1 


LUq 


1 


1 


-1 


-1 




1 


-1 


-1 


1 


X 


4 












It follows that % = cua + cue + + and so, m = (1, 1, 1, 1) 



W = W[uj k ] © W[u c ] © W[u G ) © W[uj t }, 
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where 

W[u k ] = N Ul 
In fact, if we write 
(5.1) 

we have 



A =A+C+G+T 
G =A-C+G-T 



C= A+C-G-T 
T=A-C-G+T 



W[u k ] = (A) W[lu c ] = (C) W[u G ] = (G) W[lu t ] = (T> 

We remark that the basis {A, C, G, T} is the image of {A, C, G, T} by the Fourier trans- 
form described in [CFS08] or [UGS05j . 

Since y 2 — 4w a + 4cj c -|-4u; g -|-4cc; t , we have m(2) = (4,4,4,4). In virtue of Remark 

eh 



W ® W[u A ] = (A <g> A, C <g> C, G ® G, T ® T) 
W <g> W[w ] = (A <g> C, C <g> A, G <g> T, T <g> G) 
<g> W[wg] = ( A <g> G, C <g> T, G <g> A, T <g> C) 
<g> W[u; T ] = (A ® T, C ® G, G <g> C, T ® A) 



Then, J e is given by the conditions 
(5.2) rk 
where M z £ M 44 for all Z E B, that is 



/ M A \ 

M c 

M G 

y o o o m t y 



< (1,1,1,1) 



M« 



9AAAA 9AACC 9AAGG 9AATT 

9CCAA <?CCCC 9CCGG 9CCTT 

9GGAA <?GGCC 9GGGG 9GGTT 

9TTAA 9TTCC qTTGG 9TTTT 

9AGAG <?AGCT <?AGGA 9AGTC 

9CTAG 9CTCT yCTGA ?CTTC 

9GAAG 9GACT 9GAGA <?GATC 

9TCAG 9TCCT 9TCGA <?TCTC 



Mr. 



9ACAC QkkCh 9AAGT 

9CAAC 9CACA 9CAGT 

9GTAC 9GTCA 9GTGT 

9TGAC 9TGCA 9TGGT 

<?ATAT <?ATCG 9ATGC 

9CGAT <?CGCG f/CGGC 

9GCAT 9GCCG 9GCGC 

9TAAT 9TACG <7TAGC 




f/ATTA 
9CGTA 
9GCTA 
<7TATA 



where gxix 2 x 3 x 4 are the coordinates in the basis {Xi <g> X 2 ® X 3 ® X 4 } XieB - The ideal J e 
obtained by imposing the rank conditions of (15. 3p is generated by (2) (2) + (2) (2) + 
(2) (2) (2) (2) = quadrics. However, at any point of V(/ e ) the variety is locally 
defined by 36 quadrics (see [CFS081 Example 4.9]). 
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5.4. Kimura 2-parameter model. Take G = ((ACGT), (AG)), which is isomorphic 
to the dihedral group. The equivariant matrices for this group have the following 
structure: 



(a 

b 



V 



b 

a 
b 

c 



b \ 

c 
b 
a 



The equivariant model is the Kimura 2-parameter model introduced in |Kim 80j. 
There are 5 irreducible characters uii, u>2, 0Jz, loa, to and the corresponding table is 





id 


(ACGT) 


(AG) 


(AG)(CT) 


(ATGC) 




1 


1 


1 


1 


1 


UJ 2 


1 


1 


-1 


1 


1 


L0 3 


1 


-1 


1 


1 


-1 


UJ4 


1 


-1 


-1 


1 


-1 


to 


2 








-2 





X 


4 





2 









Notice that G is not abelian and that the irreducible representation u is 2-dimensional. 
It follows that % = cui + CU3 + and so, m = (1, 0, 1, 0, 1) and 

W = W[ui]®W[u 3 ]®W[u], 

where 

W[U!] = N U1 W[u 3 } = N W3 W[u\ N u . 

In fact, with the notation of (15.11) we have 

W[u 1 ] = (k) W[u 3 ] = (G) W[lu] = (C,T} 

Now, we consider the case of four leaves. We have x 2 — + ^2 + 3^ 3 + u; 4 + 4u, 
so m(2) = (3, 1, 3, 1,4). If ip e C{T) G , then 



TfeW 



( 51 






V 





5 2 









S3 










S4 










G M 



m(2),m(2) 



where 



Si e M 3 , 3 S 2 e M 1A S 3 e M 3 , 3 5 3 e M 1A S e M 4 , 4 . 
Then, the ideal I e is given by the condition 
(5.3) rkT/ Ll | L2 (^)< (1,0,1,0,1). 

By imposing these rank conditions to the matrix Tf^^^ip) we obtain (o) (2) + 
(1) (!) + (I) (2) + (!) (!) + (J) © = 9 + 1 + 9 + 1 + 36 = 56 invariants: 54 of them 
are quadrics and 2 of them are linear invariants. 
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5.5. Jukes-Cantor model. Finally, we take the whole group of permutations 64. 
The equivariant matrices for this group have the following structure: 

/ a b b b \ 

b a b b 

b b a b 

b b b a 



The equivariant model associated to it is the Jukes-Cantor model introduced in 
[JC69J. The group 64 has five irreducible characters {^i}i=o,...,4 (see §2.3 of |FH91j ) 
and the following character table: 



^64 


id 


(AC) 


(ACG) 


(ACGT) 


(AC)(GT) 




1 


1 


1 


1 


1 


Ui 


1 


-1 


1 


-1 


1 


UJ 2 


2 





-1 





2 


L0 3 


3 


1 





-1 


-1 


004 


3 


-1 





1 


- 1 


X 


4 


2 


1 









It follows that 

X = uj + u 3 , 

that is, x is the sum of the trivial and the standard representations. We have 
m = (1, 0, 0, 1, 0). Thus, there is a decomposition 

W = W[u) ]@W[u 3 ], 

where 

W[u Q ] = <g> C m ° S N L 

In fact, with the notation of (15.11) . we have 

W[u ) = @) W[u 3 ] = (C,Gj). 
The ideal I e is generated by the (rrij + l)-minors of the j-th box of Tf e {ip) with 



^'0 



dim W[u ] 
dim Wb 3 ] 



1 
3. 



j = 1, 2, . . . , 5. On the other hand, it is straightforward to see that y 2 
3co>3 + u>4, so m(2) = (2, 0, 1, 3, 1) and we have 



2u + 002 + 



(W®W)[uo] 
(W®W)[w 2 ] 
(W®W)[u 3 ] 
{W®W)[uj a } 



(qkk, Qcc + Qgg + <tn) 
{Qcc ~ Qgg, Qcc ~ <1tt) 

\Qkc, QkG j QkT, Qck, QGk, Qik, Qcr + Qtc, Qcg + Qgc, Qgt + Qtg) 
\Qct — 9tC) Qcg — Qgc, Qgt — Qtg) 
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and g X y = (Zx ® QV? for any X, Y G -B. Now, if ^ G £(T)® 4 we have 

(So \ 

? 5 3 S G M m(2)jm(2) 

5 4 y 

where 

S gM 2i2 S 2 eM 1(1 S 3 gM 3i3 S 4 eM 1:1 . 

For instance, we have 



<?AAAA 


<Zaacc + 


<Zaagg + <Zaatt 




gcccc H 


- QGGCC + 9TTCC + 


<?CCAA + QGGkA + ^TTAA 


qccGG H 


- Qgggg + gTrGG+ 




9CCTT 


+ 9GGTT + 9TTTT 



while 

S 2 = (<?CCCC — <?CCGG — <?GGCC + ^GGGg)- 

Now, given ip G C(T) &4 , we have t/> G V(T) if and only if 
(5.4) rk TfM < m. 

By imposing these rank conditions to the matrix Tf e (ip) we obtain ( 2 ) ( 2 ) + + 
(I) (!) + © + (D (!) = 12 Phyfogenetic invariants {/i} i= i,...,i 2 : 

1. /i, . . . , fxo have degree 2 and are obtained by the conditions rk (So), rk (S3) = 1 
2- fn, fu have degree one and are obtained by the conditions Si, S4 = 0. These 
two invariants are equivalent to Lake's invariants (cf. [Lak87j). 
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